Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips

نویسندگان

  • Olivia Sanchez-Graillet
  • Joanna Rowsell
  • William B. Langdon
  • Maria A. Stalteri
  • Jose M. Arteaga-Salas
  • Graham J. G. Upton
  • Andrew P. Harrison
چکیده

We have developed a computational pipeline to analyse large surveys of Affymetrix GeneChips, for example NCBI's Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic biases associated with the GeneChip technology. Our bioinformatics pipeline integrates the mapping of probes to exons, quality control checks on each GeneChip which identifies flaws in hybridization quality, and the mining of correlations in intensities between groups of probes. The output from our pipeline has enabled us to identify systematic biases in GeneChip data. We are also able to use the pipeline as a discovery tool for biology. We have discovered that in the majority of cases, Affymetrix probesets on Human GeneChips do not measure one unique block of transcription. Instead we see numerous examples of outlier probes. Our study has also identified that in a number of probesets the mismatch probes are an informative diagnostic of expression, rather than providing a measure of background contamination. We report evidence for systematic biases in GeneChip technology associated with probe-probe interactions. We also see signatures associated with post-transcriptional processing of RNA, such as alternative polyadenylation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene filtering by PCA for Affymetrix GeneChips

Due to the nature of the array experiments which examine the expression of tens of thousands of genes (or probesets) simultaneously, the number of null hypotheses to be tested is large. Hence multiple testing correction is often necessary to control the number of false positives. However, multiple testing correction can lead to low statistical power in detecting genes that are truly differentia...

متن کامل

Probes containing runs of guanines provide insights into the biophysics and bioinformatics of Affymetrix GeneChips

The reliable interpretation of Affymetrix GeneChip data is a multi-faceted problem. The interplay between biophysics, bioinformatics and mining of GeneChip surveys is leading to new insights into how best to analyse the data. Many of the molecular processes occurring on the surfaces of GeneChips result from the high surface density of probes. Interactions between neighbouring adjacent probes af...

متن کامل

Microarray labeling extension values: laboratory signatures for Affymetrix GeneChips

Interlaboratory comparison of microarray data, even when using the same platform, imposes several challenges to scientists. RNA quality, RNA labeling efficiency, hybridization procedures and data-mining tools can all contribute variations in each laboratory. In Affymetrix GeneChips, about 11-20 different 25-mer oligonucleotides are used to measure the level of each transcript. Here, we report t...

متن کامل

A benchmark for Affymetrix GeneChip expression measures

MOTIVATION The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics. There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the b...

متن کامل

Motif effects in Affymetrix GeneChips seriously affect probe intensities

An Affymetrix GeneChip consists of an array of hundreds of thousands of probes (each a sequence of 25 bases) with the probe values being used to infer the extent to which genes are expressed in the biological material under investigation. In this article, we demonstrate that these probe values are also strongly influenced by their precise base sequence. We use data from >28 000 CEL files relati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of integrative bioinformatics

دوره 5 2  شماره 

صفحات  -

تاریخ انتشار 2008